Add Query for reranking KnnFloatVectorQuery with full-precision vectors #14009

dungba88 · 2024-11-22T00:49:52Z

Description

Added a new Query which wraps around KnnFloatVectorQuery and does re-ranking for quantized index using full precision vectors. The idea is to first run KnnFloatVectorQuery with over-sampled k (x1.5, x2, x5, etc) and then re-rank the docs using full-precision (original, non-quantized) vector, and finally take top-k.

Questions:

Should we expose the target inside KnnFloatVectorQuery so that users don't need to pass the target twice? Currently it only exposes the getTargetCopy() which requires array copy so it's inefficient, but I assume the intention is to encapsulate the array so that it won't be modified from outside?
Maybe out of scope for this PR, but I'm curious how people think about using mlock for preventing the quantized vectors from being swapped out, as loading fp vectors (although only a small set per query) means there will more pressure on RAM.

Usage:

KnnFloatVectorQuery knnQuery = ...; // create the KnnFloatVectorQuery with some over-sampled k
RerankKnnFloatVectorQuery query = new RerankKnnFloatVectorQuery(knnQuery, targetVector, k);
TopDocs topDocs = searcher.search(query, k);

dungba88 · 2024-11-22T00:54:57Z

The build fails with The import org.apache.lucene.codecs.lucene100 cannot be resolved, I thought this is already in mainline. Will check.

Edit: It has been moved to backward codecs. Will use something more stable.

dungba88 · 2024-11-26T06:41:21Z

I have a preliminary benchmark here (top-k=100, fanout=0) using Cohere 768 dataset.

Anyhow I can see these 2 things that should be addressed:

If we access the full-sized vectors, it will swap the memory that is allocated (either through preloading, or through mmap) for quantized vectors (main search phase) if there's not enough memory. Eventually, some % part of the quantized index will be swapped out which will slower the search. If we have to load all full-precision vectors to memory, then that kinda defeats the purpose of quantization. I'm wondering if there could be a way we can access full-precision vectors without interfering with the space of quantized vectors.
The latency could be better. With oversample=1.5 (second dot) for 4_bit, we have around the same latency and recall as baseline. Although one can argue that we can save memory compared to baseline, with new access pattern of two-phase search that saving might be diminished. Otherwise it seems to have little benefit over just using plain HNSW.

shatejas · 2024-11-26T07:49:30Z

lucene/core/src/java/org/apache/lucene/search/RerankKnnFloatVectorQuery.java

+    }
+    Weight weight = indexSearcher.createWeight(rewritten, ScoreMode.COMPLETE_NO_SCORES, 1.0f);
+    HitQueue queue = new HitQueue(k, false);
+    for (var leaf : reader.leaves()) {


Should this be switched to parallel execution similar to AbstractKnnVectorQuery?

Good question, I was using single-thread as a simple version and try to benchmark the latency first, since multi-thread could add some overhead as well. This class only does vector loading and similarity computation for a small set of vectors (k * oversample) so it's not as CPU-intensive as the AbstractKnnVectorQuery

I'll also try multi-thread and run the benchmark again. From the below benchmark, the re-ranking phase only adds a trivial amount of latency it might not help much. Also the benchmark code seems to force merge so there's only a single partition, we need to change so that there are multiple partitions.

dungba88 · 2024-11-27T03:57:18Z

Edit: My previous benchmark was wrong because the vectors are corrupted

First benchmark show the recall improvement for each oversample with reranking. It now aligns with what was produced in #13651.

Second benchmark compare the latency across all algorithms. We are still adding only a small latency for the reranking phase.

Last benchmark, I just ran oversample without reranking, but still cutoff at original K (so they act similar to fanout). This is just to make sure that the reranking phase actually adds value. Expectedly, the recall does not improve much compared to the reranking.

NOTE: The dots in all benchmarks represent the oversample factor with values of 1, 1.5, 2, 3, 4, 5. Oversample of 1 means no over-sampling. See https://github.com/mikemccand/luceneutil/blob/main/src/main/knn/KnnGraphTester.java#L833-L834

dungba88 · 2024-11-27T23:35:04Z

Also this is the luceneutil branch I used for benchmarking: https://github.com/dungba88/luceneutil/tree/dungba88/two-phase-search, which incorporates the test for BQ implementation by @benwtrent and the two-phase search.

huynmg · 2024-11-28T07:12:07Z

lucene/core/src/test/org/apache/lucene/search/TestRerankKnnFloatVectorQuery.java

+        float expectedScore = VECTOR_SIMILARITY_FUNCTION.compare(targetVector, docVector);
+        Assert.assertEquals(
+            "Score does not match expected similarity for doc ord: " + scoreDoc.doc + ", id: " + id,
+            expectedScore,
+            scoreDoc.score,
+            1e-5);
+      }


We can test that the results are sorted by exact distance.

Maybe we can also test that the result of the same query with oversample will be "at lease the same or better" than without oversample ? By "better" I mean we should have higher recall. But I'm not sure if it's deterministic

Thinking again, the docs should be sorted by ord, so my first point should be irrelevant.

shubhamvishu · 2024-11-28T19:15:45Z

lucene/core/src/java/org/apache/lucene/search/RerankKnnFloatVectorQuery.java

+    HitQueue queue = new HitQueue(k, false);
+    for (var leaf : reader.leaves()) {


Here we have the access to IndexSearcher#getTaskExecutor and could use it to parallelize the work across segments(like we did earlier with some other query rewrites). But the HitQueue here isn't thread-safe. I don't know if using concurrency after making insertWithOverflow thread-safe would be really helpful since it looks like the added cost is cheap? or Maybe it will be?

That's right. In order to apply parallelism we need to use a per-segment queue, then merge it like in AbstractKnnVectorQuery.mergeLeafResults. I think the added latency is already low, but still want to try if it helps.

github-actions · 2024-12-14T00:23:51Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

mikemccand · 2025-04-30T12:25:29Z

I think this is a nice overall approach, adding a new RerankKnnFloatVectorQuery that wraps a KNN query that used quantization to get the initial results.

It's reminiscent of Lucene's existing QueryRescorer, to implement multi-phased ranking, except that class doesn't wrap another Query... maybe it should (separately)!

I'm curious about your results here -- why is recall better for 1bit and 4bit than 7bit, when reranking?

dungba88 · 2025-05-05T03:45:55Z

I'm curious about #14009 (comment) -- why is recall better for 1bit and 4bit than 7bit, when reranking?

The graph is a bit confusing, but the dots are the oversample (from 1 to 5). If we compare the recall with the same oversample, then 7-bit is always better or same. The difference becomes smaller at higher oversample. E.g, at oversample=1 , 7 bit has 20% higher recall than 1-bit but at oversample=5 they are mostly the same.

dungba88 · 2025-06-13T01:42:05Z

lucene/CHANGES.txt


 * GITHUB#13285: Early terminate graph searches of AbstractVectorSimilarityQuery to follow timeout set from
  IndexSearcher#setTimeout(QueryTimeout). (Kaival Parikh)
-  


Seems like my IDE automatically removes extra spaces. If there is objection I'll revert that in next rev, along with other feedbacks.

vigyasharma · 2025-06-13T06:25:14Z

lucene/CHANGES.txt

  to speed up computing the number of hits when possible. (Lu Xugang, Luca Cavanna, Adrien Grand)

-* LUCENE-10422: Monitor Improvements: `Monitor` can use a custom `Directory` 
+* LUCENE-10422: Monitor Improvements: `Monitor` can use a custom `Directory`


Looks like a lot of unrelated changes, probably needs a merge from main?

It's due to my IDE automatically remove extra white spaces, will revert in next rev.

this problem is fixed for all files in main now, and any new trailing whitespaces will fail the CI build.

vigyasharma · 2025-06-13T06:27:29Z

lucene/core/src/java/org/apache/lucene/search/RescoreTopNQuery.java

+import org.apache.lucene.index.IndexReader;
+
+/**
+ * A Query that re-scores another Query with a DoubleValueSource function and cut-off the results at


This sounds generic to any query, but rewrite always returns a KnnFloatVectorQuery ?

The createRewrittenQuery method would return a DocAndScoreQuery which is currently internal to KNN, but from the client/API point of view it's the same as any other Query. Moreover it only contains the docId and respective scores.

We can extract createRewrittenQuery into separate class for more reusability if needed. I can put a prerequisite PR if that makes sense.

lucene/core/src/java/org/apache/lucene/search/RescoreTopNQuery.java

vigyasharma · 2025-06-20T19:39:47Z

Thanks for the explanations, @dungba88. I suppose the scenario you're trying to solve for, is when users want to change the matchset of a KnnVectorQuery using full-precision or other reranking. I'm open to this change if it's a valid requirement.

My perception is that it should be solvable by plumbing vector query results into a rescorer, then combining top-K hits with other (lexical) hits. For e.g. is this also a problem for hybrid search in OpenSearch/Elasticsearch ? I suspect they might have independent queries for both with some way to combine results.

dungba88 · 2025-06-20T22:13:19Z

is when users want to change the matchset of a KnnVectorQuery using full-precision or other reranking

Yes that's correct, @vigyasharma. We are using a hybrid search where KnnFloatVectorQuery and TermQuery (amongst others) are combined into a single BooleanQuery. Thus it is important to change the matchset of KnnFloatVectorQuery individually.

For hybrid search in OpenSearch/Elastic Search, I'm wondering if @jmazanec15 and @benwtrent have any input. I'm having a feeling that it's quite common to combine lexical + KNN matching into a single BooleanQuery.

benwtrent · 2025-06-23T12:34:47Z

kNN queries are completed in the rewrite phase, if any rescoring needs to be done, it should be done during that full phase.

I would expect the experience to be:

RescoreQueryWithVectorQueryThingy(KnnQuery) and the rescore will occur during rewrite (or atleast provide a scorer that iterates the kNN query results calculating the higher fidelity scores).

kNN should be "just another query" and should be combinable with any other query. I realize this is a bit tricky as kNN is unique in that it effectively "collects" its results up-front.

vigyasharma

Thanks for persisting on this @dungba88 , changes look good. I have a few suggestions but this looks almost ready!

vigyasharma · 2025-06-23T17:19:53Z

lucene/core/src/java/org/apache/lucene/search/RescoreTopNQuery.java

+      scoreDocs[i++] = topDoc;
+    }
+    TopDocs topDocs =
+        new TopDocs(new TotalHits(queue.size(), TotalHits.Relation.EQUAL_TO), scoreDocs);


Instead of setting this to configured n, should we retain the total no. of hits and relation from original query?

I have no preference, but I can't find where Scorer or Weight would expose the relation of the original query.

I used the original result count as totalHits, but keep the relation as EQUAL_TO, as it doesn't seem it was exposed anywhere. Usually this would only be visible on search(query, n) with TopDocsCollector.

I also realized that this totalHitsRelation value would not be used anywhere so it would not matter.

lucene/core/src/java/org/apache/lucene/search/RescoreTopNQuery.java

lucene/core/src/test/org/apache/lucene/search/TestRescoreTopNQuery.java

lucene/core/src/java/org/apache/lucene/search/RescoreTopNQuery.java

dungba88 · 2025-06-24T14:53:09Z

Sorry for spamming the replies! I should have gone to the Files changed tab, which allow sending all replies in the same message.

jmazanec15 · 2025-06-24T15:22:52Z

For hybrid search in OpenSearch/Elastic Search, I'm wondering if @jmazanec15 and @benwtrent have any input. I'm having a feeling that it's quite common to combine lexical + KNN matching into a single BooleanQuery.

Not sure Im following the discussion completely. It is common to combine lexical and k-NN in a boolean query, but I think there is a lot of variety in what/how users are implementing hybrid search, so flexibility is great to have. Like @benwtrent mentioned, result computation for k-NN is done upfront, but queries can also be used to re-score after the initial phase via the QueryRescorer (as @mikemccand mentioned awhile ago), so I dont think a separate rescorer is necessary.

I also like the point around lazy iteration: "or atleast provide a scorer that iterates the kNN query results calculating the higher fidelity scores". For expensive re-scoring (I think multi-vector will be), this might be nice to have too for hybrid/boolean queries - I think this approach is take in FloatVectorSimilarityQuery. But, this can probably be taken for future consideration.

dungba88 · 2025-06-26T01:14:34Z

result computation for k-NN is done upfront, but queries can also be used to re-score after the initial phase via the QueryRescorer (as @mikemccand mentioned awhile ago), so I dont think a separate rescorer is necessary.

Rescorer can be used, but IIUC Rescorer works only in the collection phase. After the first pass collection we will rescore the final results. This would not work if we combine semantic and lexical matching into a single Query, in that case we can only rescore the combined matches. Like @benwtrent mentioned, rescoring should be done in the rewrite phase. This will work for both the cases where semantic and lexical are combined or where semantic matching alone.

"or atleast provide a scorer that iterates the kNN query results calculating the higher fidelity scores"

This is also handled by this PR, technically not a Scorer but a DoubleValueSources. User can use it to either use full-precision vectors, or even use another field for rescoring (amongst other use cases, as DoubleValueSources is extensible). A potential idea is to have 1-bit vector field for matching and another 4-bit or 7-bit vector field for rescoring. However the quantization cost of query vector even for scalar 7-bit is a bit high. We will tackle it as future optimization to this new Query.

Also thanks @vigyasharma for approving the PR! Can you help to merge it if there is no objection?

vigyasharma · 2025-06-26T20:50:18Z

Also thanks @vigyasharma for approving the PR! Can you help to merge it if there is no objection?

Yes, I'll merge in these changes tonight. Was waiting a day to allow people to give feedback on the latest revision if they want to.

vigyasharma · 2025-06-28T07:54:00Z

@dungba88 – Can this change go in 10.3 instead of waiting for 11.0? I didn't see anything blocking so I updated the changes entry. But I'm running into some merge issues while backporting, likely because of the DocAndScoreQuery refactor?

If you would like to raise a separate PR for 10.3 backport (against branch_10x), and I can help review it. Or if this can only go in 11.0, then I'll update back the changes entry. Let me know.

dungba88 · 2025-06-28T12:22:01Z

It should be possible to backport to 10.3. I'll raise a PR. Thanks @vigyasharma for merging!

…rs (apache#14009)

dungba88 · 2025-06-29T08:29:44Z

I put a backport PR to 10.3 here: #14860

dungba88 added 12 commits November 15, 2024 13:16

Initial commit: Add TwoPhaseKnnVectorQuery

edef1b2

Add tests

bbc7081

Remove forbidden API

96d2987

Remove forbidden API

e2ab4bc

Add javadoc

4e32971

Make the Query experimental

ccd3e25

Use Math.ceil instead of rounding

f9da336

Store target separately in child class

8d88cab

Change abstraction to wrap around KNN query

b67637a

Fix doc ord bug & flush writer multiple times

8cd3ccf

Add null check

30e377a

Refactor test case

5d1910c

dungba88 added 2 commits November 22, 2024 10:03

Merge branch 'main' into two-phase-vector

22288e5

Simplify Codec

feda6af

dungba88 mentioned this pull request Nov 22, 2024

Add refinement of quantized vector scores with fp distance calculations #13564

Closed

benwtrent self-requested a review November 22, 2024 12:42

short-circuit for case there is no oversample

3178bbc

shatejas reviewed Nov 26, 2024

View reviewed changes

dungba88 changed the title ~~Add Query for reranking KnnFloatVectorQuery~~ Add Query for reranking KnnFloatVectorQuery with full-precision vectors Nov 27, 2024

huynmg reviewed Nov 28, 2024

View reviewed changes

shubhamvishu reviewed Nov 28, 2024

View reviewed changes

github-actions bot added the Stale label Dec 14, 2024

github-actions bot removed the Stale label May 1, 2025

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Jun 13, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Jun 13, 2025

dungba88 commented Jun 13, 2025

View reviewed changes

vigyasharma reviewed Jun 13, 2025

View reviewed changes

vigyasharma reviewed Jun 23, 2025

View reviewed changes

dungba88 added 4 commits June 25, 2025 08:02

Merge branch 'main' into two-phase-vector

307f468

Address comments

f65f80a

Add license

b9554ed

fix comment

6a0572a

vigyasharma approved these changes Jun 25, 2025

View reviewed changes

Update Changes to target 10.3

8812731

github-actions bot modified the milestones: 11.0.0, 10.3.0 Jun 28, 2025

lint fix

da96929

vigyasharma merged commit 3404496 into apache:main Jun 28, 2025
8 checks passed

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking Jun 28, 2025

dungba88 added a commit to dungba88/lucene that referenced this pull request Jun 28, 2025

Add Query for reranking KnnFloatVectorQuery with full-precision vecto…

db36ae1

…rs (apache#14009)

dungba88 mentioned this pull request Jun 28, 2025

Backport RescoreTopNQuery to 10.3 #14860

Merged

dungba88 mentioned this pull request Jul 24, 2025

Asymmetric quantization for computing KNN similarity score #14984

Open

		HitQueue queue = new HitQueue(k, false);
		for (var leaf : reader.leaves()) {


		* GITHUB#13285: Early terminate graph searches of AbstractVectorSimilarityQuery to follow timeout set from
		IndexSearcher#setTimeout(QueryTimeout). (Kaival Parikh)

Add Query for reranking KnnFloatVectorQuery with full-precision vectors #14009

Add Query for reranking KnnFloatVectorQuery with full-precision vectors #14009

Uh oh!

Conversation

dungba88 commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

dungba88 commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dungba88 commented Nov 26, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dungba88 Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dungba88 commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dungba88 commented Nov 27, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shubhamvishu Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 14, 2024

Uh oh!

mikemccand commented Apr 30, 2025

Uh oh!

dungba88 commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dungba88 Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vigyasharma commented Jun 20, 2025

Uh oh!

dungba88 commented Jun 20, 2025

Uh oh!

benwtrent commented Jun 23, 2025

Uh oh!

vigyasharma left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dungba88 commented Jun 24, 2025

Uh oh!

jmazanec15 commented Jun 24, 2025

Uh oh!

dungba88 commented Nov 22, 2024 •

edited

Loading

dungba88 commented Nov 22, 2024 •

edited

Loading

dungba88 Nov 27, 2024 •

edited

Loading

dungba88 commented Nov 27, 2024 •

edited

Loading

shubhamvishu Nov 28, 2024 •

edited

Loading

dungba88 commented May 5, 2025 •

edited

Loading

dungba88 Jun 13, 2025 •

edited

Loading